Picture for Chaofan Tao

Chaofan Tao

GRPO-VPS: Enhancing Group Relative Policy Optimization with Verifiable Process Supervision for Effective Reasoning

Add code
Apr 22, 2026
Viaarxiv icon

Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

Add code
Apr 11, 2026
Viaarxiv icon

Beyond Outliers: A Data-Free Layer-wise Mixed-Precision Quantization Approach Driven by Numerical and Structural Dual-Sensitivity

Add code
Mar 18, 2026
Viaarxiv icon

DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning

Add code
Feb 23, 2026
Viaarxiv icon

BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models

Add code
Feb 04, 2026
Viaarxiv icon

OVD: On-policy Verbal Distillation

Add code
Jan 29, 2026
Viaarxiv icon

From Verifiable Dot to Reward Chain: Harnessing Verifiable Reference-based Rewards for Reinforcement Learning of Open-ended Generation

Add code
Jan 26, 2026
Viaarxiv icon

Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

Add code
Jan 20, 2026
Viaarxiv icon

MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents

Add code
Jan 18, 2026
Viaarxiv icon

SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving

Add code
Jan 07, 2026
Viaarxiv icon